This notebook aims to compile and annotate the R scripts used for the analysis of the Eudicot-Botrytis dataset.

This notebook charges R scripts that contain all the code. The main result figures are plotted directly in the notebook.

Experimental design:

Eight Eudicot species were included in this dataset. This includes 7 crops and Arabidopsis thaliana as control. For the crops, six low improvement genotypes (wild, landraces) and six high improvement genotypes (cultivar, inbred lines) were tested. For each species, each genotype was inoculated with the collection of Botrytis isolates and replicated six time over two experiments.

Detached leaves were inoculated with Botrytis in ‘experimental trays’, that constitutes a micro-environment for a randomized collection of isolates. After 72h, pictures of all trays were taken. Image analysis for calculation of lesion area (and many other parameters) was conducted in R.

For image analysis R codes, see the Image_analysis_pipeline_Final R notebook.

1. Create and restructure the dataset.

Read the raw data, do small modifications on names and create variables.

Associated files: FULLMetaDat_7sp.txt

accession.list7sp.txt

Isolate_Host_origin_CORRECTED.txt

source('~/Desktop/Github_Eudicot/1.Creating_Dataset.R')

2. Summarize the data

Calculate the mean, median, standard deviation, minimum, maximum of the lesions at different levels of the dataset. The length is the number of datapoints considered for the summary statistics. This is done at the taxon level, for each Botrytis isolate and for each plant genotype. It is also calculated for pairs of plant genotypes - isolates and taxon-isolates.

source('~/Desktop/Github_Eudicot/2.Summary_RawData.R')
 
  ggplot(data=Summary_Isolate, aes(IsolateID, mean.iso, color=Origin)) +
   geom_point()+
   ylab("mean lesion size in cm2")+
   ggtitle("Isolate variation colored by geographical origin")+
   xlab("Botrytis Isolates")+
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(data=Summary_Isolate, aes(IsolateID, mean.iso, color=Host)) +
   geom_point()+
   ylab("mean lesion size in cm2")+
   ggtitle("Isolate variation colored by original host")+
   xlab("Botrytis Isolates")+
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(data=Summary_PGeno, aes(SNames, mean.geno, color=Taxon, shape=Order)) +
   geom_point()+
   ggtitle("Plant genotype variation")+
   ylab("mean lesion size in cm2")+
   xlab("Botrytis Isolates")+
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(data=Summary_IsoGen , aes(Isolate, mean.isogen)) +
   geom_boxplot()+
   ggtitle("Variation across plant genotype")+
   ylab("mean lesion size in cm2")+
   xlab("Botrytis Isolates")+
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(data=Summary_IsoTax , aes(Isolate, mean.isotax)) +
   geom_boxplot()+
   ggtitle("Variation across plant species")+
   ylab("mean lesion size in cm2")+
   xlab("Botrytis Isolates")+
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

3. Data quality control

This step remove the small size lesions that have high chance of being technical failure (for example the inoculation drops that did not contain Botrytis spores resulting in only the grape juice drop being phenotyped as lesion).

The pie charts provide further details to the table S5 about the lesion filtering and the percentage of lesion crossing the size and median thresholds.

Generated file: Eudicot7sp_clean_median.txt

source('~/Desktop/Github_Eudicot/3.Filtering_SmallLesionSize.R')

for (i in c(1:7)) {
  tmp_selection <- as.data.frame(summary(splt_Data_cleaning[[i]][,46]))
  rownames(tmp_selection) <- c("Above_Thresholds", "Size_Threshold_Only", "Median_Threshold_Only", "ToBeRemoved")
  tmpPerC <- round(tmp_selection[,1]/sum(tmp_selection[,1])*100)
  tmp_names <- paste(rownames(tmp_selection),tmpPerC, "%", sep=" ")
  pie(tmp_selection[,1], labels=tmp_names, main=Names[i], col=rainbow(length(tmp_names)))
}

4. Linear mixed models for each crop species

Lesion area was modeled independently for each species using the linear mixed model with lme4. Plant genotypes are nested within improvement level. Experimental replicate and trays as well as the individuals plants on which were collected the leaves for the detached leaf assay are considered as random factors

Associated file: Lmer_CleanM_7sp_Perc2.txt

# source('~/Desktop/Github_Eudicot/4.Modeling_clean_Data.R')
# This file contains all the mixed models for each 7 crop species.
# ! This code takes hours/days to run. !

Let’s look at the results (Figure 2B).

ModelRes <- read.table(file="Lmer_CleanM_7sp_Perc2.txt", header = T)

ModelRes$Order <- ModelRes$Variable
levels(ModelRes$Order) <- c(8,7, 5,6, 1,4, 2, 3 )
ModelRes$Order <- as.numeric(ModelRes$Order)
color_plot <- c("#762a83", "#e7d4e8", "#c2a5cf", "#762a83", "#5aae61", "#a6dba0", "#fdb863", "#c51b7d")

ggplot(ModelRes, aes(Species, Percent, fill=Variable)) +
  geom_col()+
  scale_fill_manual(values=c("#542788", "#d8daeb", "#e08214","#1b7837","#b8e186", "#878787","#bababa","#e0e0e0" ))+
  geom_text(aes(label=Percent), position=position_stack(vjust=0.5), size=3)+
  theme_minimal()

5. Calculate the Least-Square means

For each plant genotype, model corrected least-square means (LS-means) of lesion area were calculated for each strain from with Emmeans with the Satterwaite approximation.

# source('~/Desktop/Github_Eudicot/5.LSmeans_Genotype.R')
# (Remove the # to run it)

6. Data analysis on the Least-Square means

Violin Plot. How to plot Fig. 1

Describe the effects of plant domestication.

Associated file: Eudi_Ara_Lsmeans1_8sp_row_Geno_All_plot.txt

source('~/Desktop/Github_Eudicot/9.ViolinPlot.R')

ggplot(Eudi8sp, aes(Taxon1, LsMeans, fill=Wi_Do)) +
  geom_split_violin(scale="width") +
  geom_boxplot(width=0.15)+ 
  scale_fill_manual(values=c("#c7eae5", "#000000", "#878787", "#80cdc1"))+
  theme_minimal()

Meta-model on the 7 crop Eudicots

In this model,improvement levels and plant genotypes are nested within species to account for the phylogenetic common evolutionary history and possibly shared resistance traits of genotypes with low and high levels of improvement.

Associated file: LSmeanIndP_7spData_row_Geno.txt

source('~/Desktop/Github_Eudicot/7.MetaModel.R')

Heatmap. How to plot Fig. 3

To provide a global picture of how plant genotypes interact specifically with each Botrytis strains, a clustered heatmap was constructed on standardized LS-means with iheatmapr.

Associated file: Eudi_Ara_Lsmeans1_8sp_Col_Geno_All.txt

Z-scoring the LS-means

source('~/Desktop/Github_Eudicot/6.ZcoreLSmeans.R')

Heatmap:

source('~/Desktop/Github_Eudicot/8.Heatmap.R')
## Bootstrap (r = 0.5)... Done.
## Bootstrap (r = 0.6)... Done.
## Bootstrap (r = 0.69)... Done.
## Bootstrap (r = 0.8)... Done.
## Bootstrap (r = 0.9)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.1)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.3)... Done.
## Bootstrap (r = 1.39)... Done.
## Bootstrap (r = 0.49)... Done.
## Bootstrap (r = 0.59)... Done.
## Bootstrap (r = 0.69)... Done.
## Bootstrap (r = 0.79)... Done.
## Bootstrap (r = 0.89)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.1)... Done.
## Bootstrap (r = 1.2)... Done.
## Bootstrap (r = 1.3)... Done.
## Bootstrap (r = 1.4)... Done.
Eudicot_heatmap_stand

Bootstrapping test on dendrogram branches: Note: Here bootstrapping is only at 200 for a quick demo. Change to 20 000 for final results (expect ~1h of calculation).

plot(PGenost_iso_pv, main="Iso HClust 'complete'")
pvrect(PGenost_iso_pv, alpha = .95)

plot(Noadmixt_genoo_pv, main="Plant Genotypes Hierarchical Clustering method='complete'")
pvrect(Noadmixt_genoo_pv, alpha = .95)

Host specificity & virulence

source('~/Desktop/Github_Eudicot/10.CV_Lsmeans.R')

Spe_vir

B05_plot

Dav_plot

KT_Plot

UKR_plot